Skip to content

Merge master to feature branch #6203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

minglumlu
Copy link
Member

No description provided.

kc284 and others added 30 commits December 12, 2024 01:32
- Moved notes on error structure to the Wire protocol.
- Moved duplicate file basics.md and wire-protocol.md from ocaml/doc to doc/content/xen-api.
- Moved notes on VM boot parameters to doc/content/xen-api/topics/vm-lifecycle.md and removed ocaml/doc/vm-lifecycle.md.

Signed-off-by: Konstantina Chremmou <[email protected]>
This includes the current_domain_type field, which is important for live
imports, including those during a cross-pool live migration.

Signed-off-by: Rob Hoes <[email protected]>
The target host of a live migration is running by definition the same or
a newer version of the software compared to the source host. As CPU
checks are often extended or even changed in software updates, it is
best to perform such checks on the target host. However, currently it is
the source host that does these checks. This patch moves the check to
the target host.

A cross-pool live migration always begins with a dry-run VM-metadata
import on the target host. This is where checks for free memory and GPU
capacity are already carried out. The metadata import handler is
extended to accept a `check_cpu` query parameter to signal that a CPU
check is needed. This is included in the import call done in
`assert_can_migrate` on the source host, and the old CPU check in there
is dropped.

Source hosts without this patch will still perform the CPU checks
themselves, so we do not compromise safety.

NOTE: This is a rebase of the initial work from 07a2a71
that had to be reverted with Ming's suggestion to skip CPUID check
on 'unspecified' snapshots implemented.

Signed-off-by: Rob Hoes <[email protected]>
CPU checks are needed only for running VMs that are being migrated, to
check for compatibility with the remote-host's CPUs.

NOTE: This is the rebase of the initial work from 3d039f3
that had to be reverted with the fix from df7cbfd incorporated.

Signed-off-by: Rob Hoes <[email protected]>
…CPU check to the target host (xapi-project#6175)

Rebased the work from 2023 merged in xapi-project#5111 and xapi-project#5132, that caused issues
and was partially fixed in xapi-project#5148, but was completely reverted in xapi-project#5147.
I've integrated the fix from xapi-project#5148 and additionally the fix suggested by
@minglumlu in CA-380715 that was not merged at the time due to time
constraints.

This series passed the tests that were originally failing: sxm-unres
(Job ID 4177739), vGPUSXMM60CrossPool (4177750), and also passed the
Ring3 BST+BVT (209341). I can run more migration tests if needed - I've
heard @Vincent-lau has requested for these to be separated into its own
suite instead of being only in Core and Distribution regression tests.
Regarding the docs changes, I moved the copies of `basics.md`,
`wire-protocol.md`, and `vm-lifecycle.md` from `ocaml/doc` to
`doc/content/xen-api`. The latter folder had these files already, so the
resulting changes on them are the following:

- `basics.md` has formatting changes only.
- `vm-lifecycle.md` has now the additional section on VM boot
parameters.
- `wire-protocol.md` has formatting changes, the sections for JSON-RPC,
and a section on the error format which previously was in the API errors
markdown (and was absent from the github.io docs since we don't list the
errors anywhere; we could add this list in a future PR).
It is expected to use root CA certficate to verify an appliance's server
certificate for a xapi outgoing TLS connection.

Prior to this change, the related stunnel configurations are:
"verifyPeer=yes", and "checkHost=<hostname>".

The 'verifyPeer' option of stunnel doesn't treat the CA bundle as root
CA certificates. The 'checkHost' option of stunnel only checks the
host name against the one in server certificate. In other words, the
issue is that the root CA based checking doesn't work for appliance.

This change adds 'verifyChain' for the appliance to ensure the outgoing
TLS connection from xapi will verify the appliance's server certificates
by real root CA certificate.

Signed-off-by: Ming Lu <[email protected]>
Precompute a table of object names for which events should be
propagated. This avoids the list querying done every time the database
queues an event.

Signed-off-by: Colin James <[email protected]>
Instead of logging errors like this and then immediately handling them:
```
[error|1141 |VM metadata import |import]
Found no VDI with location = dedeeb44-62b3-460e-b55c-6de45ba10cc0:
treating as fatal and abandoning import

[debug|1141 |VM metadata import |import]
Ignoring missing disk Ref:4 -
this will be mirrored during a real live migration.
```

Log once in the handler:
```
[ warn|VM metadata import |import]
Ignoring missing disk Ref:16 -
this will be mirrored during a real live migration.
(Suppressed error: 'Found no VDI with location = c208b47c-cf87-495f-bd3c-a4bc8167ef83')
```

Signed-off-by: Andrii Sultanov <[email protected]>
Instead of raising an exception in case of an error like get_by_uuid,
return None to be handled gracefully later.

Do not expose it in the datamodel.

This will later be used when an object is checked to exist before its creation
(during migration, for example), and so its absence is expected - no need to
raise a backtrace and pollute the logs with errors.

Signed-off-by: Andrii Sultanov <[email protected]>
…failure is expected

Migration logs are always full of exceptions that are expected and immediately
handled:
```
[error|backtrace] SR.get_by_uuid D:8651cc0c9fb6 failed with exception Db_exn.Read_missing_uuid("SR", "", "a94bf103-0169-6d70-8874-334261f5098e")
[error|backtrace] Raised Db_exn.Read_missing_uuid("SR", "", "a94bf103-0169-6d70-8874-334261f5098e")
[error|backtrace] 1/9 xapi Raised at file ocaml/database/db_cache_impl.ml, line 237
[error|backtrace] 2/9 xapi Called from file ocaml/xapi/db_actions.ml, line 13309
[error|backtrace] 3/9 xapi Called from file ocaml/xapi/rbac.ml, line 188
[error|backtrace] 4/9 xapi Called from file ocaml/xapi/rbac.ml, line 197
[error|backtrace] 5/9 xapi Called from file ocaml/xapi/server_helpers.ml, line 74
[error|backtrace] 6/9 xapi Called from file ocaml/xapi/server_helpers.ml, line 96
[error|backtrace] 7/9 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
[error|backtrace] 8/9 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 39
[error|backtrace] 9/9 xapi Called from file ocaml/libs/log/debug.ml, line 250
[ warn|import] Failed to find SR with UUID: a94bf103-0169-6d70-8874-334261f5098e content-type: user - will still try to find individual VDIs
[....]
[debug|import] Importing 1 VM_guest_metrics(s)
[debug|import] Importing 1 VM_metrics(s)
[debug|import] Importing 1 VM(s)
[debug|import] Importing 1 network(s)
[debug|import] Importing 0 GPU_group(s)
[debug|import] Importing 1 VBD(s)
[error|backtrace] VBD.get_by_uuid D:3a12311e8be4 failed with exception Db_exn.Read_missing_uuid("VBD", "", "026d61e9-ed8a-fc72-7fd3-77422585baff")
[error|backtrace] Raised Db_exn.Read_missing_uuid("VBD", "", "026d61e9-ed8a-fc72-7fd3-77422585baff")
[error|backtrace] 1/9 xapi Raised at file ocaml/database/db_cache_impl.ml, line 237
[error|backtrace] 2/9 xapi Called from file ocaml/xapi/db_actions.ml, line 14485
[error|backtrace] 3/9 xapi Called from file ocaml/xapi/rbac.ml, line 188
[error|backtrace] 4/9 xapi Called from file ocaml/xapi/rbac.ml, line 197
[error|backtrace] 5/9 xapi Called from file ocaml/xapi/server_helpers.ml, line 74
[error|backtrace] 6/9 xapi Called from file ocaml/xapi/server_helpers.ml, line 96
[error|backtrace] 7/9 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
[error|backtrace] 8/9 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 39
[error|backtrace] 9/9 xapi Called from file ocaml/libs/log/debug.ml, line 250
[debug|import] Importing 1 VIF(s)
[error|backtrace] VIF.get_by_uuid D:2bc78449e0bc failed with exception Db_exn.Read_missing_uuid("VIF", "", "7d14aee4-47a4-e271-4f64-fe9f9ef6d50b")
[error|backtrace] Raised Db_exn.Read_missing_uuid("VIF", "", "7d14aee4-47a4-e271-4f64-fe9f9ef6d50b")
[error|backtrace] 1/9 xapi Raised at file ocaml/database/db_cache_impl.ml, line 237
[error|backtrace] 2/9 xapi Called from file ocaml/xapi/db_actions.ml, line 10813
[error|backtrace] 3/9 xapi Called from file ocaml/xapi/rbac.ml, line 188
[error|backtrace] 4/9 xapi Called from file ocaml/xapi/rbac.ml, line 197
[error|backtrace] 5/9 xapi Called from file ocaml/xapi/server_helpers.ml, line 74
[error|backtrace] 6/9 xapi Called from file ocaml/xapi/server_helpers.ml, line 96
[error|backtrace] 7/9 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
[error|backtrace] 8/9 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 39
[error|backtrace] 9/9 xapi Called from file ocaml/libs/log/debug.ml, line 250
```

Use an internal get_by_uuid_opt call and match on the Option instead,
with the logs looking much clearer:
```
[debug|import] Importing 1 host(s)
[debug|import] Importing 2 SR(s)
[ warn|import] Failed to find SR with UUID: 8568e308-c61c-3b10-3953-45606cfecede content-type:  - will still try to find individual VDIs
[ warn|import] Failed to find SR with UUID: 40e9e252-46ac-ed3d-7a4c-6db175212195 content-type: user - will still try to find individual VDIs
[...]
[debug|import] Importing 2 VM_guest_metrics(s)
[debug|import] Importing 2 VM(s)
[debug|import] Importing 1 network(s)
[debug|import] Importing 1 GPU_group(s)
[debug|import] Importing 4 VBD(s)
[ info|import] Did not find an already existing VBD with the same uuid=569d0e60-6a89-d1fa-2ed6-38b8eebe9065, try to create a new one
[ info|import] Did not find an already existing VBD with the same uuid=533306da-cff1-7ada-71f7-2c4de8a0065b, try to create a new one
[ info|import] Did not find an already existing VBD with the same uuid=f9dec620-0180-f67f-6711-7f9e5222a682, try to create a new one
[ info|import] Did not find an already existing VBD with the same uuid=05e55076-b559-9b49-c247-e7850984ddae, try to create a new one
[debug|import] Importing 2 VIF(s)
[ info|import] Did not find an already existing VIF with the same uuid=a5a731d5-622c-5ca5-5b2a-a0053a11ef07, try to create a new one
[ info|import] Did not find an already existing VIF with the same uuid=1738bf20-8d16-0d69-48cd-8f3d9e7ea791, try to create a new one
```

Signed-off-by: Andrii Sultanov <[email protected]>
…i-project#6187)

It is expected to use root CA certficate to verify an appliance's server
certificate for a xapi outgoing TLS connection.

Prior to this change, the related stunnel configurations are:
"verifyPeer=yes", and "checkHost=<hostname>".

The 'verifyPeer' option of stunnel doesn't treat the CA bundle as root
CA certificates. The 'checkHost' option of stunnel only checks the
host name against the one in server certificate. In other words, the
issue is that the root CA based checking doesn't work for appliance.

This change adds 'verifyChain' for the appliance to ensure the outgoing
TLS connection from xapi will verify the appliance's server certificates
by real root CA certificate.
Signed-off-by: Colin James <[email protected]>
Signed-off-by: Colin James <[email protected]>
Signed-off-by: Colin James <[email protected]>
To avoid recomputing the symmetric closure several times during module
initialisation for Eventgen, we introduce a hashtable that stores the
relation.

Signed-off-by: Colin James <[email protected]>
Best reviewed by commit - the first one does not depend on database
changes, only the last one does.

In short, simplifies and clarifies migration logs from this:
```
[error|1141 |VM metadata import |import]
Found no VDI with location = dedeeb44-62b3-460e-b55c-6de45ba10cc0:
treating as fatal and abandoning import

[debug|1141 |VM metadata import |import]
Ignoring missing disk Ref:4 -
this will be mirrored during a real live migration.
```
to this:
```
[ warn|VM metadata import |import]
Ignoring missing disk Ref:16 -
this will be mirrored during a real live migration.
(Suppressed error: 'Found no VDI with location = c208b47c-cf87-495f-bd3c-a4bc8167ef83')
```

And from this:
```
[error|backtrace] SR.get_by_uuid D:8651cc0c9fb6 failed with exception Db_exn.Read_missing_uuid("SR", "", "a94bf103-0169-6d70-8874-334261f5098e")
[error|backtrace] Raised Db_exn.Read_missing_uuid("SR", "", "a94bf103-0169-6d70-8874-334261f5098e")
[error|backtrace] 1/9 xapi Raised at file ocaml/database/db_cache_impl.ml, line 237
[error|backtrace] 2/9 xapi Called from file ocaml/xapi/db_actions.ml, line 13309
[error|backtrace] 3/9 xapi Called from file ocaml/xapi/rbac.ml, line 188
[error|backtrace] 4/9 xapi Called from file ocaml/xapi/rbac.ml, line 197
[error|backtrace] 5/9 xapi Called from file ocaml/xapi/server_helpers.ml, line 74
[error|backtrace] 6/9 xapi Called from file ocaml/xapi/server_helpers.ml, line 96
[error|backtrace] 7/9 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
[error|backtrace] 8/9 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 39
[error|backtrace] 9/9 xapi Called from file ocaml/libs/log/debug.ml, line 250
[ warn|import] Failed to find SR with UUID: a94bf103-0169-6d70-8874-334261f5098e content-type: user - will still try to find individual VDIs
[....]
[debug|import] Importing 1 VM_guest_metrics(s)
[debug|import] Importing 1 VM_metrics(s)
[debug|import] Importing 1 VM(s)
[debug|import] Importing 1 network(s)
[debug|import] Importing 0 GPU_group(s)
[debug|import] Importing 1 VBD(s)
[error|backtrace] VBD.get_by_uuid D:3a12311e8be4 failed with exception Db_exn.Read_missing_uuid("VBD", "", "026d61e9-ed8a-fc72-7fd3-77422585baff")
[error|backtrace] Raised Db_exn.Read_missing_uuid("VBD", "", "026d61e9-ed8a-fc72-7fd3-77422585baff")
[error|backtrace] 1/9 xapi Raised at file ocaml/database/db_cache_impl.ml, line 237
[error|backtrace] 2/9 xapi Called from file ocaml/xapi/db_actions.ml, line 14485
[error|backtrace] 3/9 xapi Called from file ocaml/xapi/rbac.ml, line 188
[error|backtrace] 4/9 xapi Called from file ocaml/xapi/rbac.ml, line 197
[error|backtrace] 5/9 xapi Called from file ocaml/xapi/server_helpers.ml, line 74
[error|backtrace] 6/9 xapi Called from file ocaml/xapi/server_helpers.ml, line 96
[error|backtrace] 7/9 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
[error|backtrace] 8/9 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 39
[error|backtrace] 9/9 xapi Called from file ocaml/libs/log/debug.ml, line 250
[debug|import] Importing 1 VIF(s)
[error|backtrace] VIF.get_by_uuid D:2bc78449e0bc failed with exception Db_exn.Read_missing_uuid("VIF", "", "7d14aee4-47a4-e271-4f64-fe9f9ef6d50b")
[error|backtrace] Raised Db_exn.Read_missing_uuid("VIF", "", "7d14aee4-47a4-e271-4f64-fe9f9ef6d50b")
[error|backtrace] 1/9 xapi Raised at file ocaml/database/db_cache_impl.ml, line 237
[error|backtrace] 2/9 xapi Called from file ocaml/xapi/db_actions.ml, line 10813
[error|backtrace] 3/9 xapi Called from file ocaml/xapi/rbac.ml, line 188
[error|backtrace] 4/9 xapi Called from file ocaml/xapi/rbac.ml, line 197
[error|backtrace] 5/9 xapi Called from file ocaml/xapi/server_helpers.ml, line 74
[error|backtrace] 6/9 xapi Called from file ocaml/xapi/server_helpers.ml, line 96
[error|backtrace] 7/9 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
[error|backtrace] 8/9 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 39
[error|backtrace] 9/9 xapi Called from file ocaml/libs/log/debug.ml, line 250
```

to this:
```
[debug|import] Importing 1 host(s)
[debug|import] Importing 2 SR(s)
[ warn|import] Failed to find SR with UUID: 8568e308-c61c-3b10-3953-45606cfecede content-type:  - will still try to find individual VDIs
[ warn|import] Failed to find SR with UUID: 40e9e252-46ac-ed3d-7a4c-6db175212195 content-type: user - will still try to find individual VDIs
[...]
[debug|import] Importing 2 VM_guest_metrics(s)
[debug|import] Importing 2 VM(s)
[debug|import] Importing 1 network(s)
[debug|import] Importing 1 GPU_group(s)
[debug|import] Importing 4 VBD(s)
[ info|import] Did not find an already existing VBD with the same uuid=569d0e60-6a89-d1fa-2ed6-38b8eebe9065, try to create a new one
[ info|import] Did not find an already existing VBD with the same uuid=533306da-cff1-7ada-71f7-2c4de8a0065b, try to create a new one
[ info|import] Did not find an already existing VBD with the same uuid=f9dec620-0180-f67f-6711-7f9e5222a682, try to create a new one
[ info|import] Did not find an already existing VBD with the same uuid=05e55076-b559-9b49-c247-e7850984ddae, try to create a new one
[debug|import] Importing 2 VIF(s)
[ info|import] Did not find an already existing VIF with the same uuid=a5a731d5-622c-5ca5-5b2a-a0053a11ef07, try to create a new one
[ info|import] Did not find an already existing VIF with the same uuid=1738bf20-8d16-0d69-48cd-8f3d9e7ea791, try to create a new one
```
The current constraint is that the VIF used for PVS proxy must have
device number 0. It turned out that this can be relaxed. It is
sufficient to enforce that the VIF is the one with the lowest device
number for the VM.

Signed-off-by: Rob Hoes <[email protected]>
When creating a new VIF and there is already a VIF with PVS_proxy, check
that the new VIF does not have a lower device number than the PVS_proxy
VIF.

Signed-off-by: Rob Hoes <[email protected]>
The current constraint is that the VIF used for PVS proxy must have
device number 0. It turned out that this can be relaxed. It is
sufficient to enforce that the VIF is the one with the lowest device
number for the VM.
We have seen failures where a service file unexpectedly exists. It could
have been left behind but a failed stop but we don't have evidence for
that. To help with this, provide more details of the file found.

Signed-off-by: Christian Lindig <[email protected]>
This is a small effort to simplify the code within the `Eventgen`
module. The previous style is very unwieldy and makes the file appear
more complicated on the surface than it really is.

---

Passes BVT+BST (209475) but would appreciate review comments as I've
dropped the potential for stdout logging in some places (easy to add
back) and added commentary that may not be fully accurate.
…oject#6190)

We have seen failures where a service file unexpectedly exists. It could
have been left behind but a failed stop but we don't have evidence for
that. To help with this, provide more details of the file found.
We have seen swtpm systemd service files not being removed. We now call
Fe_systemctl.stop even when the servive is potentially not running to
ensure clean up is happening regardless.

Signed-off-by: Christian Lindig <[email protected]>
lindig and others added 2 commits December 20, 2024 15:52
We have seen swtpm systemd service files not being removed. We now make
Fe_systemctl.stop callable when the servive is potentially not running
to ensure clean up is happening regardless.
@minglumlu
Copy link
Member Author

No changes on the merge point:

$ git show 67381fbd4
commit 67381fbd40efc1200002ff7a09496f327450e740 (HEAD -> private/mingl/merge_master_to_feature)
Merge: a889151b7 9eeb1f3f8
Author: Ming Lu <[email protected]>
Date:   Thu Jan 2 16:17:52 2025 +0800

    Merge branch 'master' into private/mingl/merge_master_to_feature

@minglumlu minglumlu merged commit 47df335 into xapi-project:feature/pool-licensing Jan 2, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants